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Abstract 

We study the smoothed log-concave maximum likehhood estimator of a probabihty dis- 
tribution on W^. This is a fuhy automatic nonparametric density estimator, obtained 
as a canonical smoothing of the log-concave maximum likelihood estimator. We demon- 
strate its attractive features both through an analysis of its theoretical properties and 
a simulation study. Moreover, we use our methodology to develop a new test of log- 
concavity, and show how the estimator can be used as an intermediate stage of more 
involved procedures, such as constructing a classifier or estimating a functional of the 
density. Here again, the use of these procedures can be justified both on theoretical 
grounds and through its finite sample performance, and we illustrate its use in a breast 
cancer diagnosis (classification) problem. 

Key words: Classification; Functional estimation; Log-concave maximum likelihood es- 
timation; Testing log-concavity; Smoothing 



1 Introduction 

Maximum likelihood estimation of shape-constrained densities has received a great deal of 
interest recently. The allure is the prospect of obtaining fully automatic nonpar ametric es- 
timato rs, with no tuning parameters to choose. The general idea dates back to 



Grenander 



(119561 ) ■ who derived the maximum likelihood estimator of a decreasing density on [0, oo). A 



characteristic feature of these shape-constrained maximum likelihood estimators is that they 
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are not smooth. For instance, the Grenander estimator has discontinuities at some of the 
data points. The maxi mum hkehhood estimator of a multi-d imensional log-concave density is 



the exponential of what ICule. Samworth and StewartI (120101 ) call a tent function; it may have 



several ridges. Moreover, in this (and other) examples, the estimator drops discontinuously 
to zero outside the convex hull of the data. 

In some applications, the lack of smoothness may not be a drawback in itself. However, 
in other circumstances, a smooth estimate might be preferred, because: 

(a) it has a more attractive visual appearance, without ridges or discontinuities that might 
be difficult to justify to a practitioner; 

(b) it has the potential to offer substantially improved estimation performance, particularly 
for small sample sizes, where the convex hull of the data is likely to be rather small; 

(c) for certain applications, e.g. classification, the maximum likelihood estimator being zero 
outside the convex hull of the data may present problems; see Section 14.11 for further 
discussion. 

For these reasons, we investigate a smoothed version of the dimensional log-concave 
maximum likelihood estimator. The smoothing is achieved by a convolution with a Gaus- 
sian density, which preserves the log-concavity shape constraint. To decide how much to 
smooth, we exploit an interesting property of the log-concave maximum likelihood estimator, 
which provides a canonical choice of covariance matrix for the Gaussian density, thereby 
retaining the fully automatic nat u re of the e stimate. The basic idea, which was intro- 
duced by 



Diimbgen and RufibachI (12009 



201 ll ) for the case d = 1 and touched upon in 



Cule. Samworth and StewartI (120101 ) . is described in greater detail in Section 12711 

The challenge of computing the estimator, which involves a dimensional convolution 
integral, is taken up in Section 12. 2j see Figure [1] for an illustration of the estimates obtained. 
The theoretical properties of the smoothed log-concave estimator are studied in Section 12.31 
Our framework handles both cases where the log-concavity assumption holds and where it is 
violated. In Section 12.41 we present new results on the infinite-dimensional projection from 
a probability distribution on M'^ to its closest log-concave approximation; these give further 



insight into the misspecified setting. A simulation study follows in Section 12.51 confirming the 
excellent finite-sample performance. 

In Section |3l we introduce a new hypothesis test of log-concavity of multivariate distribu- 
tions based on our choice of covariance matrix for the Gaussian density. This test is consistent, 
easy to implement, and has much improved finite-sample performance compared to existing 
methods. Section H] is devoted to applications of the smoothed log-concave maximum likeli- 
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(a) (b) 

Figure 1: Density estimates based on n = 200 observations, plotted as dots, from a standard bivari- 
ate normal distribution: (a) log-concave maximum likelihood estimator; (b) smoothed log-concave 
maximum likelihood estimator. 



hood estimator to classification and other functional estimation problems. We provide theory, 
under both correct and incorrect model specification, for the performance of the resulting 
procedures in these cases. The classification methodology is applied to the Wisconsin breast 
cancer data set, where the aim is to aid the diagnosis of future potential breast cancer in- 
stances. All proofs are deferred to the Appendix. 
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2 The smoothed log-concave maximum hkehhood esti- 
mator 



2.1 Definition and basic properties 

Let V denote the set of all probability distributions P on such that P{H) < 1 for all 
hyperplanes H. In this section, we assume that Xi,X2, . . . are independent random vectors 
in M'^ with distribution Pq G V. In that case, for sufficiently large n the convex hull of the 
data, denoted C„ = conv(Xi, . . . is c?- dimensional with probability 1. It is then known 

that there exists a unique log-concave density /„ that maximises the likelihood function 



1=1 

over all log-concave densities /. The estimator is supported on C„, and log/„ is piecewise 
affine on this set. More precisely, there exists an index set J consisting of {d + l)-tuples 
J = (io! • • • ) jd) of distinct indices in {1, . . . , n}, such that C„ can be triangulated into simplices 
Cnj = conv(XjQ, . . . , XjJ in such a way that 



log/„(x) 



bjx — if a; G Cn,j, 
—oo otherwise, 



for some vectors {bj : j G J| in and real numbers {i3j : j G J}. Such a function was 
called a tent function in ICule. Samworth and StewartI (120101 ) because when d = 2 one can 
think of associating a 'tent pole' with each observation, extending vertically out of the plane. 
For certain tent pole heights, the graph of log /„ is then the roof of a taut tent stretched over 
the tent poles. 



Despite the attractive asymptotic properties of f„ derived in t 



troduction, the simulation results in 



l e pap ers cit ed in t 



Cule. Samworth and Stewart 



he m- 



feoioh and IChenl (|2010h 



indicate that the finite-sample performance is only strong relative to competitors (e.g. kernel- 
based methods) for moderate or large sample sizes (say n > 500). It appears that for smaller 
values of n, the convex hull of the data is typically not large enough for good performance. 

The idea for fully automatic smoothing of the log -concave maximum likelihood estimator 
comes from the following observa tion: Remark 2.3 of iDiinibgen. Samworth and Schuhmacher 
( I2OIII ) (see also Corollary 2.3 of iDiimbgen and RufibachI ( l2009l )) shows that while the log- 
concave maximum likelihood estimator is a good estimator of the first moment of Pq, it 
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underestimates the covariance matrix. More precisely, we have that 



/■ . 1 " 

/ Xfnix) dx = - y~] 



say. On the other hand, however. 



r 1 - - 

t= {x-X){x-XfU^)dx<-Y,{X^-X){X.,-Xf 



1 



n 



< 



n — 1 



(2.1) 



i=l 



Here, A < B and A < B mean the matrix B — A is non-negative definite and positive definite 
respectively. 

This allows us to define our modified estimator, which we call the smoothed log-concave 
maximum likelihood estimator and denote /„. It is given by 



The basic properties of /„ are summarised in the proposition below. 

Proposition 1. Let Pq G V, and let fn denote the smoothed log-concave maximum likelihood 
estimator fn based on independent observations Xi, . . . ,X„ having distribution Pq. Then 

(a) /„ is log-concave; 

(b) the support of fn is W^; 

(c) /„ is a real analytic function on (in particular, it is infinitely differentiable) ; 

(d) the mean and covariance matrix corresponding to fn agree with the sample mean and 

sample covariance matrix: f^^ xfn{x) dx = X and f^d{x — X)(x — X)^/„(x) dx = T,. 

2.2 Computational issues 

The aim of this section is to describe algorithms for computing the smoothed log-concave 
maximum likelihood estimator /„. As a preliminary step, we need to compute the covariance 
matrix A of the multivariate normal distribution used in the convolution (12. 2p . 



fn = fn* (f), 




where is the ci-variate normal density with zero mean and covariance matrix A = S — S. 
Note that the level of smoothing is automatically determined through the matrix A. 
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2.2.1 Computation of the covariance matrix A 

Recall that A = S — S, where S is the sample covariance matrix, and 

t = [ xx^ fn{x) dx - XX^ = [ exp{bjx - 13 j) dx - XX^. (2.3) 

We make an affine transformation of each of the regions of integration onto the unit simplex. 
Recall that C„j = convex j^, . . . , XjJ, set Dj = det[Xj^ — Xj^,Xj^ — Xj^, . . . , Xj^ — Xj^], and 
let Ud = {u = {u^^ . . . ,Ud ) G [0,00)'^ : X]f=i — ^e the unit simplex in R'^. Following 
Cule and DiimbgenI (120081 ). we further define the auxiliary functions Jj : R*^"^^ — > R by 



^d(?/o, yi,...,yi) = exp ( V myi ]dui... du^, 



where mq = 1 — X]f=i Then, writing yj^ = log/„(XjJ, we have 

i xx^expibjx - /?,) = E l^^l I if^/'^'') (^""'^'0 



d'^Jd{yjo^yji,---,yjJ 



1=0 l'=0 
d d 



dyndyj, 

{a a 
Yl Yl '^'^+2 (yjo , i/ii , • • • , yjd . yji . yji> ) 
1=0 l'=0 

d . 

+ Y iVjo ,yn,---, yu ' Vh ' Vh ) \ ■ 



We have applied the basic results of ICule and DiimbgenI (120081) in the last step. An exact 
expression for Jd+2{,-) is given in Appendix B.l of Icule. Samworth and Stewart ( 2010) when 
i ts arg uments are non-zero and distinct. The Taylor approximation of Icule and Diimbgen 
jaoosl) can be used when some of the arguments are small or have similar (or equal) values. 



2.2.2 Computation of the smoothed log-concave maximum likelihood estimator 

We have 



fn{XQ) = Y ^ 



1 



(27r)'^/2(detA)i/2 
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By making an affine transformation of each Cn.j onto the unit simplex as in Section I2.2.H 
we reduce the problem to integrating the exponential of a quadratic polynomial over the unit 
si mplex. In gener al, this has no explicit solution, so it has to be evaluated numerically. 



StroudI (I1971I ) gi ves a brief introduction t o the problem of evaluating integrals over the 



unit simplex, while iGrundmann and MoUerl (119781 ) proposed a combinatorial method. We 
apply their method, first noting that by integrating out one variable, the dimensionality of 
the integral can be reduced by one. To see this, consider any dxd positive definite, symmetric 

; R'^ and any constant c G M. Writing $(■) for 



matrix A = [an/], any vector B = {bi, . . . , bd)^ 
the standard normal distribution function, u 



. . . , Ud)^ and Uq = 1 — Ylf=i '^i have 



1 /-l-ui 




1 — U\ 



'0 ^0 JO 

"1 pl — Ul pl — U\ U(l- 




'0 ^0 

-1 rl-ui 




JO 



.'J. b''^ / TT f ^ / , b' 

V a' I ^ V2a/ 



dud^i ■ ■ ■ du2dui. 



(2.4) 



Here, a', b' and c' are defined by 

i, 6' = 6rf + 2 ^ ttdiui and c' = u^^[aii']i<i^i'<d-iU-d + ^ km + c. 



d-l 



1=1 



1=1 



where u^d = (""i, • • • , Ud-i)^. It follows that we can use the combinatorial method to integrate 
over the {d — 1) -dimensional unit simplex. Some special cases include: 

(a) d = 1. In this case, fl2.4p is a simple function of $(■), and the smoothed log-concave 
maximum likelihood estimator can b e computed straightforwardly. This method is imple- 
ment ed in the R package logcondens ( Rufibach and Diimbgen . 2006 : [Dlimbgen and Rufibachl . 
2OI1I ). 

(b) d = 2. In this case, f l2.4p is an integral over [0, 1], and other standard numerical integration 



methods such as the Gaussian quadrature rule, can be applied. 

The combinatorial me thod and its variations are implemented in the latest v ersion of the 



R package LogConcDEAD ( iCule et al 



2007 



Cule. Gramacy and Samworthl . |2009| ). We found 



this method to be numerically stable even with several thousand observations, when det A 
may be rather small (note that in such cases, a' in fl2.4p will typically not be close to zero). 
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However, we briefly present below two other ways of computing /„(xo); while slower in most 
cases, they do not require the inversion of A, so can be used even when det A is very small. 

(a) Monte Carlo method. 

(1) Conditional on Xi, . . . ,X„, generate independent random vectors X*, . . . from 
the Nii{xo, A) distribution. 

(2) Approximate /n(xo) by ;| Ef=i )• 

The validity of this approximation follows from the strong law of large numbers, applied 
conditional on Xi, . . . , X„. 

(b) Fourier transform. We can take advantage of the convolution property of the Fourier 
transform J-" as follows. First note that 




which can be evaluated by extending the auxiliary functions Jd to the complex plane. 
Since J^{fn){0 = ^^^'^ J'ifn){0 ^ invert J^{fn) on a flne grid using the fast 

Fourier transform. 



2.2.3 Sampling from the fitted density estimate 



Since /„ is the convolution of /„ and a multivariate normal density, conditional on Xi, 
it is straightforward to draw an observation X** from as follows: 



Y 



(a) Draw X* from /„ using the algorithm described in Ap pendix B.3 of lCule. Samworth and Stewart 
koid \ or the algorithm of Icopal and Casellal feoiol ). 

(b) Draw m ~ Xrf(0, A), independent of X*. 

(c) Return X** = X* + u. 



2.3 Theoretical performance 

It is convenient to deflne, for r = 1, 2, the classes of probability distributions on Mf^ given by 



Vr = !^PeV:J ||xH"rfP(x) < CX)|. 



The condition Pq G Vi is necessary and sufficient for the existence of a unique upper semi- 
continuous log-concave density /* that maximises J log / dPo over all log-concave densities / 
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(jDiimbgen. Samworth and Schuhmacherl . 1201 ll . Theorem 2.2). In fact, if Pq has a density /o 



and provided that / /olog/o < oo (which is certainly the case if /o is bounded), /* minimises 
the KuUback-Leibler divergence dKL{f,fo) = f /olog(/o//) over all log-concave densities /. 
In this sense, /* is the closest log-concave density to Pq. 

The density /* plays an important role in the following theorem, which describes the 
asymptotic behaviour of the smoothed log-concave maximum likelihood estimator /„. 

Theorem 2. Suppose that Pq G V2, and write fi = J^^^dPo^x) and S = J^d{x — fi){x — 
nfdPoix). Letf** = f**Nd{0,A*), where A* = S-S* withJ:* = J^d{x-fx){x-fif f*{x) dx. 
Taking ao > and 60 ^ M such that f**{x) < e"''oll''ll+^o, we have for all a < ao that 

5*¥n(a;)-r*(x)r4-0 

and, if f** is continuous, sup^-gigd e"""^" |/„(a:;) — /**(a;)| 0. 

The condition that Pq ^ ^2 imposed in Theorem [2] ensures the finiteness of A*. We see that 
in general, /„ converges to a slightly smoothed version of the closest log-concave density to Pq- 
However, if Pq has a log-concave density /o, then fo = f* = /**, so /„ is strongly consistent 
in these exponentially weighted total variation and supremum norms. In fact, suppose that 
a : M°' — 7- M is a sublinear function, i.e. a{x + y) < a{x) + a{y) and a{rx) = ra{x) for all 
x,y gM.'^ and r > 0, satisfying e°'^^^f{x) — ?■ as — ?■ 00. It can be shown that under the 
conditions of Theorem [21 

' e'^(^)|/„(x)-r*(a;)r4-0 



(jSchuhmacher. Hiisler and Diimbgenl . I2OIII ). 

Despite being smooth and having full support, it turns out that /„ is rather close to /„. 
This is quantified in the finite-sample bound below. 

Proposition 3. If x G Cnj, and fn{x) = exp{bjx — (3j), then 

fn{x) 

Moreover, 



[ |/n-/n| <2(e^^— -1 + 5, 
where A^ax = maxj^^jbjAbj, and 5„ = /„. 
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2.4 Properties of (smoothed) log-concave approximations 

In this subsection, we give new insights into the maps from a probabihty distribution P to its 
log-concave approximation /*, and its smoothed version /**. Results such as these enhance our 
understanding of the behaviour of maximum likelihood estimators in non-convex, misspecified 
models, where existing results are very limited. Theorem H] below shows that log-concave 
approximations and their smoothed analogues preserve independence of components. As well 
as being of use in our simulation studies, this is the key result which underpins a new approach 
to fitting independent component analysis models using nonparametric maximum likelihood 



(ISamworth and Yuanl . |2012| ). 



Theorem 4. Suppose that P & Vi is a product measure on M.'^, so that P = Pi ® P2, say, 
where Pi and P2 are probability measures on W^^ and Mf^^ respectively, with 62 = d — di. Let f* 
denote the log- concave approximation to P, and let denote the log- concave approximation 
to Pi, for i = 1,2. Then, writing x = (xf , x^)^, where Xi G M'^^ and X2 € M'^^, we have 

rix) = mxi)f;{x2). 

Now suppose further that P E V2- Let /** denote the smoothed log-concave approximation to 
P, and let //* denote the smoothed log-concave approximation to Pe, for £ = 1,2. Then, for 
all X — 7 } ' 

f**ix) = /r(xi)/r(x2). 

Our next theorem characterises the log-concavity constraint through the trace of the non- 
negative definite matrix A* defined in Theorem [21 

Theorem 5. Suppose that P E Vi. Then tr(A*) = if and only if P has a log-concave 
density. 

The 'if part of this statement is well-known, but the 'only if part is new. The two parts 
together motivate our testing procedure for log-concavity, which is developed in Section [31 

In most cases, it is very difficult to find explicitly the log-concave approximation /* to a 
given distribution P E Vi. Our final result of this section is straightforward to prove, but 
is of interest because it shows that some log-concave densities can have a large 'domain of 
attraction'. 

Proposition 6. Let f* be an upper semi- continuous, log-concave density on M^. Then the 
class of distributions P eVi with log- concave approximation f* is convex. 
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For instance, if /(x; a, a) = 2(\x\+a)''+'^ ^ symmetrised Pareto density with a > 1 and cr > 
0, then it can be shown that its log-concave projection is a, a) = exp{ — (a — l)|s|/cr}. 
Thus the class of distributions with whose log-concave projection is the standard Laplace 
density is infinite-dimensional. 

2.5 Finite sample performance 

Our simulation study considered the normal location mixture density /(■) = O.4</)ci(-)+O.60rf(- — 
^) for ||/i|| = 1, 2 and 3, where 0^ = 4>d,i- This mixture density is log-concave if and only if 
< 2. For each density, for d = 2 and d = 3, and for sample sizes n = 100 and n = 1000, 
we computed the Integrated Squared Error (ISE) of the smoothed log-concave maximum 
likelihood estimator for each of 50 replications. We also computed the ISE of the log-concave 
maximum likelihood estimator and that of a kernel density estimator with a Gaussian kernel 
and the optimal ISE bandwidth for each individual data set, which would be unknown in 
practice. The boxplots of the ISEs for the different me thods are given in Figure [2] for d = 3. 
The analogous plots for the case d = 2 can be found in lChen and SamworthI (120111 ). 

We see that when the true density is log-concave, the smoothed log-concave estimator offers 
substantial ISE improvements over its unsmoothed analogue for both sample sizes, particularly 
at the smaller sample size n = 100. It also outperforms by a considerable margin the kernel 
density estimator with the optimal ISE bandwidth. When the log-concavity assumption is 
violated, the smoothed log-concave estimator is still competitive with the optimal-ISE kernel 
estimator at the smaller sample size n = 100, and also improves on its unsmoothed analogue. 
However, at the larger sample size n = 1000, the bias caused by the fact that f^dif* — fY > 
dominates the contribution from the variance of the estimator, and the kernel estimator is an 
improvement. These results confirm that the smoothed log-concave estimator has excellent 
performance when the true density is log-concave, and remains competitive in situations where 
the log-concavity assumption is violated, provided that the modelling bias caused by this 
misspecification is not too large relative to the sampling variability of the estimator. 



3 A new test of log-concavity 



Several tests of log-concavity have been proposed in th e literature. lAnI (119951) and 



Walther 



(I2OO2I ) discuss various tests for univariate data, while ICule. Samworth and StewartI (I2OIOI ) 
presented two tests of log-concavity for multivariate data. iHazeltonl (120111 ) proposed another 
multivariate test based on kernel density estimates which had improved finite-sample perfor- 
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(d) 



ISE 



SMLCD 
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(e) 



ISE 



(c) (f) 

Figure 2: Boxplots of ISEs for d = 3 with the Gaussian location mixture true density for the smoothed 
log-concave maximum likelihood estimator SMLCD, log-concave maximum likelihood estimator LCD 
and kernel density estimator with the 'oracle' optimal ISE bandwidth: (a) n = 100, = 1; (b) 
n = 100, WnW = 2; (c) n = 100, ||/x|| = 3; (d) n = 1000, = 1; (e) n = 1000, = 2; (f) n = 1000, 
\\fj,\\ = 3. 

12 



mance on his simulated examples. However, none of these multivariate tests has theoretical 
support. 

Suppose Xi, . . . , Xn ~ Pq E Vi, and we seek a size a G (0, 1) test of Hq : Pq has a log- 
concave density against Hi : Pq does not have a log-concave density. Motivated by Theorem El 
we propose the following procedure: 

(a) Compute the log-concave maximum likelihood density estimate /„. 

(b) Compute the test statistic tr(A), where A = S — S, as in ( 12. ip . 

(c) Generate a reference distribution as follows: for b = 1, . . . ,B, draw conditionally inde- 
pendent samples X*^,...,X*^ from /„. For each bootstrap sample, first compute the 
log-concave maximum likelihood estimator f^t- Then compute tr^Anb), where 



n 



J2ix:b-x;){x:,-x;f~ / {x-x;){x-x:fu{x)dx, 



and Xt = n-' Y.ti ^Ib- 

^{tr(A)>tr(A„6)} 

We call this procedure a trace test. It is justified by the following result: 



(d) Reject if (P + Ef=Y l|tWA)>trM„.)l > ^ " «• 



Theorem 7. Suppose that Pq G Vi. The trace test is consistent: that is, if Pq is not log- 
concave, then for each P G N, the power of the test converges to one as n — t- oo. 

We remark that if Pq E V2, one can also draw bootstrap samples from /„ instead of /„ 
in Step (c). To illustrate the performance of the test, we ran two small simulation studies. 
In the first study, we simulated from the bivariate mixture of normal distributions density 
f{x) = ^(f)2,i{x) + \4>2,i{x — /i), with ||yu|| =0,2,4 (which we recall is log-concave if and only 
if WnW < 2). For each simulation setup, we performed 200 hypothesis tests with P = 99. The 
proportion of times that the null hypothesis was rejected in a size a = 0.05 test is reported in 
Ta ble [H For comp arison, we also report the results fr om the critical bandwidth tes t prop osed 



by lHazeltonl ( I2OIII ). The permutation test studied by lCule. Samworth and StewartI (120101 ) did 
not perform as well as the critical bandwidth te st, so we omitted its results here. For the second 



study, we replicate the settings considered in iHazeltonl ( 1201 ll ) , where four different types of 



bivariate densities of independent components were chosen. The marginal distributions were: 

(a) A |A^(0, 1/4) + |A^(0,4) distribution and a |iV(0, 1/4) + |A^(2,4) distribution; 

(b) A t^ distribution in both cases; 

(c) A ^N{0, 1/4) + In {2, A) and a t^ distribution; 
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n 


Method 


||/i|| = 


W^W = 2 


11/^11=4 


200 


critical bandwidth 


0.065 


0.015 


0.985 




trace 


0.045 


0.045 


1.000 


500 


critical bandwidth 


0.045 


0.005 


1.000 




trace 


0.045 


0.055 


1.000 



Table 1: Proportion of times out of 200 repetitions that the null hypothesis was rejected with 
a = 0.05. 



(d) A lN{0, 1/4) + ^N{2, 5) density, and a r(2, 1) distribution. 

Note that all of these densities are unimodal but not log-concave. The corresponding estimates 
of the power of the tests are presented in Table [21 The first study confirms that the trace test 



n 


Method 




Cases 








(a) 


(b) (c) 


(d) 


200 


critical bandwidth 


0.520 


0.195 0.395 


0.295 




trace 


1.000 


0.960 1.000 


1.000 


500 


critical bandwidth 


0.760 


0.340 0.710 


0.505 




trace 


1.000 


1.000 1.000 


1.000 



Table 2: Proportion of times out of 200 repetitions that the null hypothesis was rejected with 
a = 0.05. 



controls the Type I error satisfactorily (and appears to be less conservative than the critical 
bandwidth test when ||yu|| = 2). The results of the second study, though, are quite striking, 
and suggest that our new test for log-conc avity has conside rably improved finite-sample power 
compared to the critical bandwidth test. iHazeltod ( l201ll ) noted that the critical bandwidth 
test can have reduced power due to the boundary bias of the kernel estimators and is quite 
sensitive to the outliers (in fact, one also needs to pick a compact region containing the 
majority of the data, and this choice is somewhat arbitrary). Our test avoids these issues and 
performs well even in the presence of outliers or when the true density has bounded support. 



4 Other applications 
4.1 Classification problems 

Changing notation slightly from the previous section, we now assume that {X,Y), (Xi,Y'i), 
. . ., (X„, Yn) are independent and identically distributed pairs taking values in M"' x {1, . . . , K}. 
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Let F(Y = k) = 7Tk for k = 1, . . . ,K, and suppose that conditional on Y = k, the random 
vector X has distribution P^. 

A classifier is a measurable function C : M.'^ — t- {1, . . . ,K}, with the interpretation that 
the classifier assigns the point a; G M*^ to class C{x). The mis classification error rate, or risk, 
of C is 

Risk(C) = P{C(X) ^ Y}. 

In the case where each distribution has a density /fc, the classifier that minimises the risk 
is the Bayes classifier C^^^^^, given by 

^Baycs^^^ = argmax -Kkfkix)- 

ke{l,...,K} 

(For all classifiers defined by an argmax as above, we will for the sake of definiteness split ties 
by taking the smallest element of the argmax.) We will also be interested in the log-concave 
Bayes classifier and smoothed log-concave Bayes classifier, defined respectively by 

^LCBayes^^^) = argmax TikfAx) and ^^^^^^^"'(a;) = argmax Tikfr{x). 
fce{i,...,x} k(i{i,...,K} 

Here, fl and fl* are the log-concave approximation to Pk and its smoothed analogue, defined 
in Theorem [2l In particular, both classifier coincide with the Bayes classifier when {Pk : k = 
1, . . . , K} have log-concave densities. Empirical analogues of these theoretical classifiers are 
given by 

Clf^i^) = argmax Nkfn.k{x) and C^^{x) = argmax Nkfn,k{x). 

k&{l,...,K] k&{l,...,K} 

Here, Nk = XliLi ^{Y,=k} is the number of observations from the kth class, and fn,k and fn,k are 
respectively the log-concave maximum likelihood estimator of fk and its smoothed analogue, 
based on {Xj : Yi = k}. 

The theorem below describes the asymptotic behaviour of these classifiers. It reveals that 
the risk of (7^^ and C*^^^ converges not (in general) to the Bayes risk, but instead to the risk 
of C^'^^^^^^ and (^SLCBayes respectively. This is a similar situation to that encountered when a 
parametric classifier such as linear or quadratic discriminant analysis is used, but the relevant 
parametric modelling assumptions fail to hold. It suggests that the classifiers C*^*" and C*^^*^ 
should only be used when the hypothesis of log-concavity can be expected to hold, at least 
approximately. 

Theorem 8. (a) Assume Pk E Vi for k = 1, . . . , K . Let X* = {x e M.'^ : \ argmax^, -KkfUx) \ = 
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1}. Then C'^^(x) C^^^^^^^x) for almost all x G X* , and 



(b) Now assume Pk G V2 for k = 1, . . . , K . Let X** = {x G M"* : | argmax^ 7rfc/j(!*(x)| = 1}. 
Then Cf^^{x) cSLCBayesj^^^ fl/mosi all x e X** , and 

Risk(C'f ^) ^ Risk(CS^^^'^y''^). 

In fact, the smoothed log-concave classifier is somewhat easier to apply in practical clas- 
sification problems than its unsmoothed analogue. This is because if xq G is outside the 
convex hull of the training data for each of the K classes (an event of positive probabil- 
ity), then the log-concave maximum likelihood estimates of the densities at Xq are all zero. 
Thus all such points would be assigned by (7^^ to Class 1. On the other hand, C^^^ avoids 
this problem altogeth e r. Fo r these reasons, we considered only C*^^^ in our simulation study 



(jChen and Samworthl . I2OIII ) and below. 

We remark that the direct use (or any other classifier based on nonparametric den- 

sity estimation) is not recommended when d > A, due to the curse of dimensionality. In such 
circumstances there are two options: dimension reduction (cf. Section 14.21 below), or further 



mode lling assumptions such as independent component analysis models ( ISamworth and Yuan 



2OI2I ). In either case, the methodology we develop remains applicable, but now as part of a 



more involved procedure. 



4.2 Breast cancer example 



In the Wisconsin breast cancer data set (jStreet. Wolberg and Mangasarianl . Il993h . 30 mea 



surements were taken from a digitised image of a fine needle aspirate of different breast masses. 
There are 357 benign and 212 malignant instances, and we aim to construct a classifier based 
on this training data set to aid future diagnoses. Only the first two principal components 
of the training data were considered, and these capture 63% of the total variability; cf. Fig- 
ure[3]^a). This was done to make our procedure computationally feasible, to reduce the effect 
of the curse of dimensionality, and to facilitate plots such as Figure [3] below. 

In Figure [3t^b), we show the smoothed log-concave density estimates of both the benign 
and malignant classes. Figure ^c) plots the decision boundaries of the smoothed log-concave 
classifier, where we treat benign cases and malignant cases equally. However in practice. 
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Figure 3: (a) Wisconsin breast cancer data (benign cases in green; malignant cases in red); (b) 
smoothed log-concave maximum likelihood density estimates; (c) and (d) plot the decision boundaries 
of the smoothed log-concave classifier, where the loss L2 = 1 and L2 = 100, respectively. 



17 



misdiagnosing a malignant tumour as benign is much more serious than misidentifying a 
benign one as mahgnant. One may therefore seek to incorporate different losses into the 
classifier. For k = 1,2, let Lk denote the cost of failure to recognise the class k (this notion 
can easily be generalised to multicategory situations were Lkk' is the loss incurred in assigning 
the pair (X, Y) to class k' when Y = k). Redefining the risk as 

msk{C) = LiF{C{x) = 2 n y = 1} + L2F{C{x) = i n f = 2}, 

the same asymptotic properties continue to hold, mutatis mutandis, for the classifier 

C^^^*{x) = aigmaxNkLkfnAx)- 
fce{i,2} 

We observe that this modification requires no recalculation of the smoothed log-concave den- 
sity estimates and there is no loss of generality in taking Li = 1. A GUI with slider is 
implemented in the R package LogConcDEAD, which provides a way of demonstrating how the 
decision boundaries change as L2 varies. For the purpose of illustration. Figure El^d) plots 
the decision boundaries of (7^^^* when the cost L2 of misidentifying a malignant tumour is 
100. Compared with Figure [3]^c), observations are of course considerably more likely to be 
classified as malignant under this setting. 

4.3 Functional estimation problems 

Classification problems are an important example of a situation where one is interested in 
a functional of one or more density estimates, rather than the density estimate itself. For 
simplicity of exposition, we return in this section to the situation where we have a single 
independent sample Xi, . . . ,Xn distributed according to a distribution Pq. 

In general, we can consider estimating a functional 60 = 0{Po) using the plug-in smoothed 
log-concave estimate 9n = 0{Pn), where P„ is the distribution with density /„. Note that even 
if this functional cannot be computed directly, it is usually straightforward to construct a 
Monte Carlo approximation to On by applying the algorithm for sampling from /„ outlined in 
Section [2.2.31 To describe the theoretical properties of these functional estimates, for a > 0, 
let Ba denote the set of signed measures P on with /jgde""^'" d\P\{x) < 00. Equip Ba with 
the norm 

||P|U= [ e"ll^llrf|P|(x). 
We can then consider ^ as a measurable function on {Ba, \\ ■ \\a) taking values in some other 
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normed space {B, \\ ■ ||). 

Proposition 9. Let Pq G V2, and let Pq* denote the probability distribution whose density 
is the smoothed version of the log-concave approximation to Pq. Suppose that 6 : Ba B is 
continuous, and let 6** = 6{Pq*). Then ||^n — ^**|| -4' as n ^ 00. 

Once again, we remark that if Pq lias a log-concave density, then Pq = Pq*. The fact that 
the topology on Ba is rather strong means that the continuity requirement on 6 is relatively 
weak. This is illustrated in the following corollary, which considers the special case of linear 
functionals in Proposition [9l 

Corollary 10. Let Pq E V2, and let aQ > and Bq e R be such that f**{x) < e-'^o 11^11+''% 
where /** is the smoothed log-concave approximation to Pq. Let 6{P) = j^^gdP for some 
measurable function (7 : M"' — > R satisfying 

sup e"''"'''"|5f(x)| < 00 (4.1) 

for some a < uq. Then On -4-' 9** . 



Acknowledgments 

We thank the Associate Editor and two anonymous referees for their helpful comments. The 
second author is grateful for the support of a Leverhulme Research Fellowship and an EPSRC 
Early Career Fellowship. 

5 Appendix 

Proof of Proposition [T] 



(a ) This follows immediately from Theorems 2.8 and 2.18 of iDharmadhikari and Joag-Dev 



fll988h . 

(b) Note that for any non-empty open set B C M*^, 



fn{x) dx= / fn{.y)<Pd,Ai.^ " v) dydx, 
which is positive, since the integrand is positive and continuous on the region of integration. 
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(c) The fact that fn is infinitely differentiable follows from Proposition 8.10 of iFoUand 

( Il999l ). In fact, using standard multi-index notation with a = (ai, . . . , a^) and = (^)"^ • • • (af7)°'^5 



we have = fn* Writing \a\ = Yl'i=i — YYi=i "^^'^ follows that for any 

xo e R'^ and k eN, 



fn{x) - 2^ ^-j [x-xo) 



\a\<k 



< / fn{y) 





'PdA^-y) - 2^ Ti {x-y-XoY 



\a\<k 



dy 



Cule and Samworth 



as k oo, by the dominated convergence theorem and Lemma 1 of 

feoiok 

(d) Conditional on Xi, . . . , X„, let X* and Y* be independent, with X* having density /„ 
and Y* having density 0^^, so that X* + Y* has conditional density /„. Then 

E(X*+y*|Xi,...,X„) = E(X*|Xi,...,XO= f xUx)dx = X, 

and 



Cov(x* + r*|Xi,...,x„) = Cov(x*|Xi,...,x„,) + Cov(r*|Xi,...,x„) = t + A = t. 



□ 

Proof of Theorem [2] 

Let dp and d^y denote the Prohorov and total variation metrics on the space of probability 
measures on M'^. Recall that dp metrises weak convergence, and that dp < dxv- Let fin 
denote the probability measure corresponding to the density /„, let i>„ denote the probability 
measure corresponding to the convolution of /* with the measure X(i(0, A), and let v denote 
the probability measure corresponding to /**. Then 

dp{fin, < dp{fin, z>„) + rfp(i>„, v) 

< drvifi'n, i^n) + dp{l)n, I') 

= \ I |/n*iV,(0,i)-r *X,(0,i)|+dp(z>„,z/) 

< \ I \fn-n+dp{yn.y)- (5.1) 
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he fi rst term of (15. ip converges almost surely to zero, by Theorem 2.15 of lDiimbgen. Samworth and Schuhn 



( 201ll ). The second term al so converges almost surely to zero, using the fact that A A* 



as n — i- oo. Proposition 2 of ICule and Samworth! ( 120101 ) strengthens the mode of convergence 
and yields the result. □ 
Proof of Proposition [3] 



If X G Cn,j, and fn{x) = exp{bjx 
It follows that 

fnix) - fnix) 



Pj), then fn{x -y) < exp{6j(x - y) - (5j} for all y e 



Now 



fn{x) 



\fn fn\ 



< 



1. 



^dAy) dy 



I fn fn I ~l~ 

{fn - fn)+ + {fn - fn)+ + ^n- 



(5.2) 



But 

/ {fn — fn)+ = / {fn ~ fn)+ " 

J Cn Cn 

It therefore follows from this and (15. 2p that 



{fn fn) 



{fn - fn)+ + ^r, 



Cn 



I l/n - Al < 2 V / fn{x){e^''^^'^ -l)dx + 25n < 2(e5^— - 1 + <5„), 



as required. 

Proof of Theorem H] 

(a) Let / be an arbitrary log-concave density on 



□ 



and let X be a random vector with 



density /. Letting X = {Xf, Xj)^, where Xi and X2 take values in and M*^^ respectively, 
we write fx^ for the marginal de nsity of X i and fy^|y, (-[xi ) for the conditional density of X2 
given X-\ = x-\. By Theorem 6 of iPrekopal (119731 ). fxi is log-concave and by Proposition 1 of 
Cule. Samworth and StewartI (120 10[ ). fx2\Xi{-\xi) is log-concave for each Xi. 

There is also no loss of generality in assuming / is upper semi- continuous. Since P G "Pi, 
we may assume without loss of generality that J^^ \ log/| dP < 00. We may therefore apply 
Fubini's theorem and seek to maximise over all upper semi-continuous log-concave densities 
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the quantity 



log fdP 



{hgfxiixi) + log fx2\Xi{x2\xi)}dP2{x2)dPi{xi) 

logfx^{xi)Pi{dxi)+ / logfx2\xAx2\xi)dP2{x2)dPi{xi). (5.3) 



The first term on the right-hand side of fl5.3p is maximised uniquely over all upper semi- 
continuous log-concave densities by setting fx^ = /i • Moreover, for any fixed xi, the quantity 
J^di^og fx2\Xi{x2\xi) dP2{x2) is maximised uniquely over upper semi-continuous log-concave 
densities by setting fx2\Xi{-\xi) = /g. Since this choice does not depend on xi, it maximises 
the second term on the right-hand side of (15.31) . Because both terms can be maximised 
simultaneously, it follows that /* = , as desired. 

(b) Write S and S* for the covariance matrices corresponding to the probability distribu- 
tion P and the density /* respectively. The independence structure of Pq and /* gives that 

. Here, Si and S* are di x di submatrices, while S2 





" Si 












s = 


and S* = 







S2 







s* _ 



and S2 are d2 x d2 submatrices. Therefore, A* 



S* is of the form A* 



Al 

a; 

Writing x, ?/ G M'' as (xf , x^)"^ and {yj, yj)^ respectively, where Xi, yi E M'^^ and X2, 1/2 ^ ^ 
it follows again by Fubini's theorem that 

r*{x) = ir*N,io,A*))ix) 

/i*(z/i)/2 (?/2) dNa2{0, A*,)ix2 - t/2) dNaM Al){xi - y{) 
fl{y,) dN,,{0, ADix, -yi)]\ [ /2 (1/2) dN,M A;){x2 - 2/2) 
/r(a:i)/2*(a:2). 



pd2 



□ 



Proof of Theorem [5] 

Let P E Vi, and let /* denote its log-concave approximation. Without loss of general- 
ity, we may assume J^^x dP{x) = 0, so it suffices to show that if A* := J^^^xx'^ dP{x) — 
Jjgjj xx^ f*{x) dx is the zero matrix, then P has a log-concave density. 

Let P* denote the distribution corresponding to /*, let X ~ P and let X* ~ P*. For 
an arbitrary u G M'^, let and F* denote the distribution functions of u^X and u^X* 
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respectively, and let 

Gu{s)= f F.,{t)dt and Gl{s) = f F:{t)dt. 



Fix s G M. By applying Remark 2.3 of iDiimbgen. Samworth and Schuhmacherl (120111 ) to the 



convex function x t— t- (s — u^x)^ and Fubini's theorem, we have that 

0< / is-u^x)+d{P-P*){x)= / / l{„T,<,<,}rftrf(P-P*)(a:) 

jR't jR't J ~oo 

{F^-F:){t)dt = G.{s)-G:{s). (5.4) 



Since all moments of log-concave densities are finite, we have J^^ xx'^ f*{x) dx < oo. So, since 
y4* = 0, we must have P E V2- We can therefore integrate by parts as follows: 



"OO 

0= / {u''xYd{P-P*)ix)= I t^d{Fu-F:)it) = -2 I t{Fu-F:){t)dt 

) J — oo 

/"OO 

= 2/ {G^-Gimdt. (5.5) 



Combining (15. 4p . (15. 5p and the fact that Gu — G^ is continuous, we deduce that G^ = G*. 
Thus Fu = F*, by the fundamental theorem of calculus and the fact that Fu and F* are both 
right-continuous. It follows that 

/oo roo 
e''dFu{t)= / e^*rfFJ(t) = E(e^"^^*). 
-oo J —oo 

Since m G M'^ was arbitrary, we deduce that P = P*, so P has a log-concave density. □ 
Proof of Proposition [6] 

Suppose that the upper semi-continuous log-concave density /* is the log-concave approxima- 
tion to Pi, P2 E Vi. Then for each t G (0, 1), we see that /* also maximises 

l0gfd{tPi + {l-t)P2)=t [ log/rfPi + (l-t) / l0g/dP2 

over all upper semi-continuous log-concave densities / on M'^. □ 
Proof of Theorem [7] 

Let d2 denote the second Mallows metric on V2, so d2{P,Q) = inf(x,y)~(p,Q){lE||-'f — 

where the infimum is taken over all pairs (X, Y) of random vectors X ~ P and F ~ Q on 
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a common probability space. Recall that the infimum in this definition is attained, and that 
if P, Pi, P2, • • • G ^2, then (i2(P„, P) if and only if both Pn P and J^^ ||a;p dPn{x) 
f^d WxW^ dP{x). Let P* denote the distribution corresponding to the log-concave approxima- 
tion to Po, and for 5 > to be chosen later, let Q2,s denote the subset of V2 consisting of 
those distributions Q with d2{Q,P*) < 6 that have a log-concave density. Fix e > and 
let Q G Q2,s- Let P„ and Q„ denote the empirical distribution of an independent sample 
of size n from P* and an independent sample from Q respectively. We will require a bound 
for P{(i2(Qn, Pn) > e/4} that holds uniformly over Q2,6, and obtain this using the following 
coupling argument. We may suppose that (Xi, Yi), . . . , (X„, y„) are independent and identi- 
cally distributed pairs with Xi ~ P* and ~ Q and that P„ and Q„ are obtained as the 
empirical distribution of Xi, . . . , X„ and Fi, . . . , F„ respectively. We may further suppose that 
E||Xj — Yip = dl{P*, Q); in other words, Xj and Yi are coupled in such a way that they attain 
the infimum in the definition of the second Mallows distan ce. Using standard re s ults o n the 



Mallows distance (e.g. Equation (8.2) and Lemma 8.7 of iBickel and FreedmanI (jl98l[ )). we 
deduce that for 6 < 

sup P{rf2(Qn, P„) > e/4} < sup P - V ||X, - Yif > — 

<- sup W.{\\X,-Y4^)<—^<-. 

Now let Qn denote the distribution corresponding to the log-concave maximum likelihood 
estimator constructed from Xi,...,X„, and let Q„ denote the empirical distribution of a 
sample of size n which, conditional on Xi,...,X„, is drawn independently from Q„. By 
reducing 5 > if necessary, we may assume 5 < e/4. It follows that 

P{c^2(Qn, P*) > e} < sup P{rf2(Qn, Q) > 3e/4} + F{d2iQn, P*) > 5} 

< sup P{c/2(Q„,Pn) > e/4} + P{rf2(P„,P*) > e/4} + P{d2(Qn,P*) > S} 

Q&Q2.S 

< I + P{t/2(Pn, P*) > e/4} + nd2{Qn, P*) > 6} < € (5.6) 

for sufficiently large n. The final convergence of the second term here follows from the 
weak law of large numbe rs, while for the third term it follows from Proposition 2(c) of 



Cule and SamworthI (120101 ) and the dominated convergence theorem. 



Let Qnb and Qnb denote respectively the empirical distribution and the distribution cor- 
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responding to the log-concave maximum likelihood estimator of the 6th bootstrap sample 

X *i,, . . . , X*f^ drawn from Qn. We deduce from (15. 6p . Th eorem 2.15 oflDumbgen. Samworth and Schuhmach( 
( I2OIII ) and another application of Proposition 2(c) of ICule and Samworth! (120101 ) that there 
exists a > such that 

/ e'^l'^lld(Q„b-P*)(a;) Ao. (5.7) 

Now let 



A: 



n 



nb 



n ~ 1 



{x - Xl){x - Xlf dQ^,{x) - / {x-Xl){x-XlY dQM. 



where X^ = J2i=i-^ib- ^^^om (15.61) . (15. 7p . the dominated convergence theorem and the 
continuous mapping theorem, we have that tr(y4„;,) A as n — 00. On the other hand, in 
the notation of Theorem [21 

tr(i) = tr(S) - tr(S) A tr(S) - tr(S*) = tr(A*) > 0, 



where the final claim follows from Theorem [5] and the fact that Pq does not have a log-concave 
density. Note that this claim holds even if Pq eVi \ V2, in which case tr(S) = 00. 



Write Z, 



nb 



1 



{tr(A„i,)>tr(A*)/2}' 



and note that Zni, . . . , Z^b are exchangeable (so in partic- 



ular, identically distributed). Thus, for any a G (0, 1), 

B+l 



( 1 

P(Do not reject E^) =v\ - — - ^ 1 

^ ~'~ b=\ 



{tr(A)>tr(i„i,)} 



< 1 - a 



< P{tr(i) < tr(A*)/2} + P( 5^ > 1 - a ) 

< P{tr(i) < tr(A*)/2} + ^ 



1 — a 

as n — 7- 00. We deduce that for any given size of test a G (0, 1), the power at any alternative 
converges to 1. □ 
Proof of Theorem [8] 
(a) Note that 



argmax — fn,k{x) 
k€{i,...,K} n 



We have that \ fn,k — /^l A' as n — 00 for every and in fact, by Theorem 10.8 of 
Rockafellan ( 1997 ). it is almost surely the case that fn,k converges to fl uniformly on compact 
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sets in the interior of the support of f^. By the strong law of large numbers and the fact 
that the boundary of the support of has zero dimensional Lebesgue measure, it therefore 
follows that 

for almost all a; G X*. 

In fact, with probability one, ^fn,k converges to tt^/^ uniformly on compact sets in the 
interior of the support of It follows immediately from this and the dominated convergence 
theorem that 

Risk(C';^^) ^ Risk(CLCBayes)_ 

(b) The proof is virtually identical to that of Part (a), so is omitted. □ 
Proof of Proposition [9] 

The conclusion of Theorem |2] can be stated in the notation of Section 14.31 as 

||Pn-Po1la"4-0. 

The result therefore follows immediately by the continuous mapping theorem. □ 
Proof of Corollary [TU] 

It suffices to show that under condition (14. ip . the functional 0{P) = J^^ g dP is continuous. Fix 
a < ao such that sup^jgi^d e~"ll^ll|(y'(x)| < oo, and choose a sequence (P„) such that ||P„ — P||a — 
0. Then 

\e{Pn)-e{p)\< [ \g\d\p^-p\ 

< sup e-*ll|^(x)| [ e'^l'^ll d|P„ - P| 
= sup e-*ll|(7(x)|||P„-P|U^0 

as n — )■ OO. Thus 9 is continuous, as required. □ 
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